Improving Scalability with Loop Transformations and

نویسندگان

  • Federico Bassetti
  • Kei Davis
  • Dan Quinlan
چکیده

Application codes reliably achieve performance far less than the advertised capabilities of existing architectures, and this problem is worsening with increasingly-parallel machines. For large-scale numerical applications, stencil operations are often impose the greater part of the computational cost, and the primary sources of ineeciency are the costs of message passing and poor cache utilization. This paper proposes and demonstrates optimizations for stencil and stencil-like computations for both serial and parallel environments that ameliorate these sources of ineeciency. Achieving scalability, we believe, requires both algorithm design and compile-time support. The optimizations we present are automatable because our stencil-like computations are implemented at a high level of abstraction using object-oriented parallel array class libraries. These optimizations, which are beyond the capabilities of today compilers, may be performed automatically by a preprocessor such as the one we are currently developing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Scalability of Loop Tiling Techniques

The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting parallelism for optimizing dense array codes. This model is expressive enough to describe transformations of imperfectly nested loops, and to capture a variety of program transformations, including many approaches to loop tiling. Tools such as the highly successful PLuTo automatic parallelizer hav...

متن کامل

International Workshop on Polyhedral Compilation Techniques

The Polyhedral model has proven to be a valuable tool for improving memory locality and exploiting parallelism for optimizing dense array codes. This model is expressive enough to describe transformations of imperfectly nested loops, and to capture a variety of program transformations, including many approaches to loop tiling. Tools such as the highly successful PLuTo automatic parallelizer hav...

متن کامل

Multi-dimensional Incremental Loop Fusion for Data Locality

Affine loop transformations have often been used for program optimization. Usually their focus lies on single loop nests. A few recent approaches also handle global programs with multiple loop nests but they are not really scalable towards realistic applications with dozens of nests. To reduce complexity, we split affine transformations into a linear transformation step and a translation step. ...

متن کامل

Multi-dimentsional Incremetal Loops Fusion for Data Locality

Affine loop transformations have often been used for program optimization. Usually their focus lies on single loop nests. A few recent approaches also handle global programs with multiple loop nests but they are not really scalable towards realistic applications with dozens of nests. To reduce complexity, we split affine transformations into a linear transformation step and a translation step. ...

متن کامل

Impact of ILP-improving Code Transformations on Loop Buffer Energy

For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are very energy efficient. However code transformations needed in VLIW compilers to reach a higher ILP potentially may have a large negative influence on the energy consumed in the instruction memori...

متن کامل

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hardware resources and its high memory bandwidth. A Cray X1 node is composed of four MSPs, which in turn are composed of four single streaming processors (SSP). Each SSP contains a superscalar processing unit and two vecto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007